Locally Decodable Codes From Nice Subsets of Finite Fields 
and Prime Factors of Mersenne Numbers 



Kiran S. Kedlaya Sergey Yekhanin 

MIT " MIT 

^ ■ kedlaya@mit.edu yekhanin@mit.edu 

' 
<N 

u . 

O*- Abstract 
< 

ff^ ■ A k-query Locally Decodable Code (LDC) encodes an n-bit message x as an N-bit codeword C(x), such that 
one can probabilistically recover any bit X{ of the message by querying only k bits of the codeword C(x), even 
i— )| after some constant fraction of codeword bits has been corrupted. The major goal of LDC related research is to 
' establish the optimal trade-off between length and query complexity of such codes. 

Recently [34 ] introduced a novel technique for constructing locally decodable codes and vastly improved the 
\ upper bounds for code length. The technique is based on Mersenne primes. In this paper we extend the work 

1 ^i ' of [34] and argue that further progress via these methods is tied to progress on an old number theory question 

regarding the size of the largest prime factors of Mersenne numbers. 

Specifically, we show that every Mersenne number m = 2* — 1 that has a prime factor p > mP 1 yields a family 
"^T ! of k{^)- query locally decodable codes of length exp (n 1 /*) . Conversely, if for some fixed k and all e > one can 
■ use the technique of [34] to obtain a family of k-query LDCs of length exp (n e ) ; then infinitely many Mersenne 
, numbers have prime factors larger than known currently. 

'■ 

O 

1 Introduction 

O ■ 

Classical error-correcting codes allow one to encode an n-bit string x into in A-bit codeword C(x), in such 
^ ■ a way that x can still be recovered even if C(x) gets corrupted in a number of coordinates. It is well-known 
that codewords C(x) of length ./V = 0(n) already suffice to correct errors in up to 5N locations of C(x) for 
any constant 5 < 1/4. The disadvantage of classical error-correction is that one needs to consider all or most 
of the (corrupted) codeword to recover anything about x. Now suppose that one is only interested in recovering 
one or a few bits of x. In such case more efficient schemes are possible. Such schemes are known as locally 
decodable codes (LDCs). Locally decodable codes allow reconstruction of an arbitrary bit Xi, from looking only 
at k randomly chosen coordinates of C(x), where k can be as small as 2. Locally decodable codes have numerous 
applications in complexity theory [15, 29], cryptography [6, 11] and the theory of fault tolerant computation [24]. 
Below is a slightly informal definition of LDCs: 

A {k, 5, e)-locally decodable code encodes n-bit strings to A-bit codewords C(x), such that for every % 6 [n], 
the bit Xi can be recovered with probability 1 — e, by a randomized decoding procedure that makes only k queries, 
even if the codeword C(x) is corrupted in up to 5 N locations. 

One should think of 5 > and e < 1/2 as constants. The main parameters of interest in LDCs are the length 
A and the query complexity k. Ideally we would like to have both of them as small as possible. The concept 
of locally decodable codes was explicitly discussed in various papers in the early 1990s [2, 28, 21]. Katz and 



Trevisan [15] were the first to provide a formal definition of LDCs. Further work on locally decodable codes 
includes [3, 8, 20, 4, 16, 30, 34, 33, 14, 23]. 

Below is a brief summary of what was known regarding the length of LDCs prior to [34]. The length of optimal 
2-query LDCs was settled by Kerenidis and de Wolf in [16] and is exp(n). 1 The best upper bound for the length 
of 3-query LDCs was exp (n 1 / 2 ) due to Beimel et al. [3], and the best lower bound is f2(n 2 ) [33]. For general 
(constant) k the best upper bound was exp (n°( loglogfe /( fclogA: ))) due to Beimel et al. [4] and the best lower bound 

is n ( n i+v(r*/2i-i)) [33] . 

The recent work [34] improved the upper bounds to the extent that it changed the common perception of what 
may be achievable [12, 11]. [34] introduced a novel technique to construct codes from so-called nice subsets 
of finite fields and showed that every Mersenne prime p = 2* — 1 yields a family of 3-query LDCs of length 

exp (n 1 /*) . Based on the largest known Mersenne prime [9], this translates to a length of less than exp 
Combined with the recursive construction from [4], this result yields vast improvements for all values of k > 2. It 
has often been conjectured that the number of Mersenne primes is infinite. If indeed this conjecture holds, [34] gets 



three query locally decodable codes of length N = exp [n v lo s lo s>v far infinitely many n. Finally, assuming 



O 



that the conjecture of Lenstra, Pomerance and Wagstaff [31, 22, 32] regarding the density of Mersenne primes 
holds, [34] gets three query locally decodable codes of length N = exp ^(los^'iosJ ^ far all n, for every e > 
0. 



1.1 Our results 



In this paper we address two natural questions left open by [34] : 

1. Are Mersenne primes necessary for the constructions of [34]? 

2. Has the technique of [34] been pushed to its limits, or one can construct better codes through a more clever 
choice of nice subsets of finite fields? 

We extend the work of [34] and answer both of the questions above. In what follows let P{m) denote the 
largest prime factor of m. We show that one does not necessarily need to use Mersenne primes. It suffices to have 
Mersenne numbers with polynomially large prime factors. Specifically, every Mersenne number m = 2* — 1 such 
that P{m) > rxi 1 yields a family of fc(7)-query locally decodable codes of length exp (n 1 /*) . A partial converse 
also holds. Namely, if for some fixed k > 3 and all e > one can use the technique of [34] to (unconditionally) 
obtain a family of fc-query LDCs of length exp (n e ) ; then for infinitely many t we have 

P(2 t - 1) > (t/2) 1+1 /( fe - 2 ). (1) 

The bound (1) may seem quite weak in light of the widely accepted conjecture saying that the number of 
Mersenne primes is infinite. However (for any k > 3) this bound is substantially stronger than what is currently 
known unconditionally. Lower bounds for P(2 t — 1) have received a considerable amount of attention in the 
number theory literature [25, 26, 10, 27, 19, 18]. The strongest result to date is due to Stewart [27]. It says that 
for all integers t ignoring a set of asymptotic density zero, and for all functions e(t) > where e(t) tends to zero 
monotonically and arbitrarily slowly: 

P(2*-l) > e(t)t (log t) 2 / log log (2) 
'Throughout the paper we use the standard notation exp(a:) = f e oi - x \ 



There are no better bounds known to hold for infinitely many values of t, unless one is willing to accept some 
number theoretic conjectures [19, 18]. We hope that our work will further stimulate the interest in proving lower 
bounds for P(2 t — 1) in the number theory community. 

In summary, we show that one may be able to improve the unconditional bounds of [34] (say, by discovering a 
new Mersenne number with a very large prime factor) using the same technique. However any attempts to reach 
the exp (n e ) length for some fixed query complexity and all e > require either progress on an old number theory 
problem or some radically new ideas. 

In this paper we deal only with binary codes for the sake of clarity of presentation. We remark however that 
our results as well as the results of [34] can be easily generalized to larger alphabets. Such generalization will be 
discussed in detail in [35]. 

1.2 Outline 

In section 3 we introduce the key concepts of [34], namely that of combinatorial and algebraic niceness of 
subsets of finite fields. We also briefly review the construction of locally decodable codes from nice subsets. In 
section 4 we show how Mersenne numbers with large prime factors yield nice subsets of prime fields. In section 5 
we prove a partial converse. Namely, we show that every finite field ¥ q containing a sufficiently nice subset, is an 
extension of a prime field F p , where p is a large prime factor of a large Mersenne number. Our main results are 
summarized in sections 4.3 and 5.4. 

2 Notation 

We use the following standard mathematical notation: 

• [a] = {l,...,a}; 

• 7L n denotes integers modulo n; 

• F q is a finite field of q elements; 

• dn(x, y) denotes the Hamming distance between binary vectors x and y; 

• (u, v) stands for the dot product of vectors u and v; 

• For a linear space L C F™, L 1 - denotes the dual space. That is, L 1 - = {«£ F™ | G L, (u, v) =0}; 

• For an odd prime p, ord2(j>) denotes the smallest integer t such that p | 2* — 1. 

3 Nice subsets of finite fields and locally decodable codes 

In this section we introduce the key technical concepts of [34], namely that of combinatorial and algebraic 
niceness of subsets of finite fields. We briefly review the construction of locally decodable codes from nice 
subsets. Our review is concise although self-contained. We refer the reader interested in a more detailed and 
intuitive treatment of the construction to the original paper [34]. We start by formally defining locally decodable 
codes. 

Definition 1 A binary code C : {0, 1}™ — > {0, 1} W is said to be (k, 5, e)-locally decodable if there exists a 
randomized decoding algorithm A such that 



1. For all x G {0, l} n , i G [n] an<i y G {0, 1}^ smc/i f/wf d H (C{x), y) < 5N :Pr[A y {i) = x { ] > 1-e, where 
the probability is taken over the random coin tosses of the algorithm A. 

2. A makes at most k queries to y. 

We now introduce the concepts of combinatorial and algebraic niceness of subsets of finite fields. Our defini- 
tions are syntactically slightly different from the original definitions in [34]. We prefer these formulations since 
they are more appropriate for the purposes of the current paper. In what follows let F* denote the multiplicative 
group of F q . 

Definition 2 A set S C F* is called t combinatorially nice if for some constant c > and every positive integer 
m there exist two n = [cm 1 \-sized collections of vectors {u±, . . . , u n } and {v±, . . . , v n } in F™, such that 

• For all i G [n], (ui,v,j) = 0; 

• For all i,j G [n] such that i / j, (uj, Vi) G S. 

Definition 3 A set S C F* is called k algebraically nice if k is odd and there exists an odd k' < k and two sets 
So, S\ C F q such that 

• So is not empty; 

• = k'; 

• For all a eF q and (3 G S :\S n{a + pSi)\ = mod (2). 

The following lemma shows that for an algebraically nice set S, the set So can always be chosen to be large. It 
is a straightforward generalization of [34, lemma 15]. 

Lemma 4 Let S C F* be a k algebraically nice set. Let So, Si C F q be sets from the definition of algebraic 
niceness of S. One can always redefine the set So to satisfy \So\ > \q/2\- 

Proof: Let L be the linear subspace of F^ spanned by the incidence vectors of the sets a + (3 Si , for a G F q and 
(3 G S. Observe that L is invariant under the actions of a 1-transitive permutation group (permuting the coordinates 
in accordance with addition in F q ). This implies that the space L 1 - is also invariant under the actions of the same 
group. Note that L ± has positive dimension since it contains the incidence vector of the set Sq. The last two 
observations imply that L L has full support, i.e., for every i G [q] there exists a vector v G L ± such that / 0. It 
is easy to verify that any linear subspace of F2 that has full support contains a vector of Hamming weight at least 
\q/2] . Let v G L 1 - be such a vector. Redefining the set So to be the set of nonzero coordinates of v we conclude 
the proof. ■ 

We now proceed to the core proposition of [34] that shows how sets exhibiting both combinatorial and algebraic 
niceness yield locally decodable codes. 

Proposition 5 Suppose SCF* is t combinatorially nice and k algebraically nice; then for every positive integer 
n there exists a code of length exp(n 1 /*) that is (k, 5, 2kS) locally decodable for all 5 > 0. 

Proof: Our proof comes in three steps. We specify encoding and local decoding procedures for our codes and 
then argue the lower bound for the probability of correct decoding. We use the notation from definitions 2 and 3. 

Encoding: We assume that our message has length n = [cm* J for some value of m. (Otherwise we pad the 
message with zeros. It is easy to see that such padding does not not affect the asymptotic length of the code.) Our 



code will be linear. Therefore it suffices to specify the encoding of unit vectors ei, . . . , e n , where ej has length n 
and a unique non-zero coordinate j. We define the encoding of ej to be a q m long vector, whose coordinates are 
labelled by elements of F™. For all w G F™ we set: 

Enc( ej ) w = ( J' if (^) G ^ (3) 
w/ \ 0, otherwise. v 7 

It is straightforward to verify that we defined a code encoding n bits to exp(n 1 /*) bits. 

Local decoding: Given a (possibly corrupted) codeword y and an index i G [n] , the decoding algorithm „4 picks 
G F™, such that (uj, u;) G So uniformly at random, reads k' < k coordinates of y, and outputs the sum: 

Y Vw+Xw (4) 
Ae5i 

Probability of correct decoding: First we argue that decoding is always correct if A picks w G F™ such that 
all bits of y in locations {it; + \vi}\ & s-L are not corrupted. We need to show that for all i G [n], x G {0, 1}™ and 
w G F™, such that (ui,w) G S : 

2 Enc ( e i) = x i- ( 5 ) 



Note that 



X X x i Enc( ei ) = X x i X Enc (ei)»+A^ = Y x i Y 1 ^ u ^ w + Xvi ^ £ S ^ ' (6) 

xe Sl \j=i J w+Xvi j=i A G5l j=i xes, 

where I[y G So] = 1 if 7 G So and zero otherwise. Now note that 

Y I [(uj,w + At*) G S ] = Y 1 [(«* «0 + «i) G S ] = { J' ^ = w J .' se (7) 

The last identity in (7) for i = j follows from: (m,Vi) = 0, (ui,w) G So and k' = |Si| is odd. The last identity 
for i j follows from (uj,Vi) G S and the algebraic niceness of S. Combining identities (6) and (7) we get (5). 

Now assume that up to 5 fraction of bits of y are corrupted. Let Tj denote the set of coordinates whose labels 
belong to {w G F™ | (ui,w) G S } . Recall that by lemma 4, \Ti\ > q m /2. Thus at most 25 fraction of coor- 
dinates in Ti contain corrupted bits. Let Qi = {{w + At>j} Ae5i | w : (ui,w) G So} be the family of fc'-tuples 
of coordinates that may be queried by A. (ui,Vi) = implies that elements of Qi uniformly cover the set Tj. 
Combining the last two observations we conclude that with probability at least 1 — 2k5 A picks an uncorrupted 
fc'-tuple and outputs the correct value of X{. ■ 



All locally decodable codes constructed in this paper are obtained by applying proposition 5 to certain nice 
sets. Thus all our codes have the same dependence of e (the probability of the decoding error) on 5 (the fraction 
of corrupted bits). In what follows we often ignore these parameters and consider only the length and query 
complexity of codes. 



4 Mersenne numbers with large prime factors yield nice subsets of prime fields 



In what follows let (2) C F* denote the multiplicative subgroup of F* generated by 2. In [34] it is shown 
that for every Mersenne prime p = 2* — 1 the set (2) C F* is simultaneously 3 algebraically nice and ord2<j?) 
combinatorially nice. In this section we prove the same conclusion for a substantially broader class of primes. 

Lemma 6 Suppose p is an odd prime; then (2) C F* is ord2(p) combinatorially nice. 

Proof: Let t = ord2(j>). Clearly, t divides p — 1. We need to specify a constant c > such that for every positive 
integer m there exist two n = \cm t \ -sized collections of m long vectors over ¥ p satisfying: 

• For all i G [n], (uj, Vi) = 0; 

• For all i,j G [n] such that i / j, (uj,Vi) G (2). 

First assume that m has the shape m = ^ m (p^^/^*^ > f° r some integer m' > p — 1. In this case [34, lemma 

13] gives us a collection of n = ( p ™i) vectors with the right properties. Observe that n > cm* for a constant 
c that depends only on p and t. Now assume m does not have the right shape, and let m\ be the largest integer 
smaller than m that does have it. In order to get vectors of length m we use vectors of length m\ coming from [34, 
lemma 13] padded with zeros. It is not hard to verify such a construction still gives us n > cm 1 large families of 
vectors for a suitably chosen constant c. ■ 

We use the standard notation F to denote the algebraic closure of the field F. Also let C p C ¥\ denote the 
multiplicative subgroup of p-th roots of unity in F2. The next lemma generalizes [34, lemma 14]. 

Lemma 7 Let p be a prime and k be odd. Suppose there exist (± , . . . , £jfc G C p such that 

Ci + . . . + Ck = 0; (8) 

then (2) C F* is k algebraically nice. 

Proof: In what follows we define the set S± C ¥ p and prove the existence of a set Sq such that that together So 
and Si yield k algebraic niceness of (2). Identity 8 implies that there exists an odd integer k' < k and k! distinct 
p-th roots of unity ([ , . . . , Q' k G C p such that 

£ + ... + &=0. (9) 

Let t = ord2(p). Observe that C p C F 2 t. Let g be a generator of C p . Identity (9) yields g 71 + . . . + g^^ 1 = 0, for 
some distinct values of {7i}ie[fc']- Set Si = {71, . . . ,7^}. 

Consider a natural one to one correspondence between subsets S' of F p and polynomials (/)$> (x) in the ring 
¥ 2 [x]/(x p - 1) : 4> S '(x) = Yl xS - lt is eas y to see that for a11 sets S' C F p and all a,/3 G F p , such that (3 ^ : 

s&S' 

4>a+/3S'(x) = X a <f> S i(x 13 ). 

Let a be a variable ranging over F p and (3 be a variable ranging over (2). We are going to argue the existence of a 
set So that has even intersections with all sets of the form a+(3Si, by showing that all polynomials (fra+psi belong 
to a certain linear space L G F2[x]/ (x p — 1) of dimension less than p. In this case any nonempty set T C¥ p such 
that 4>T £ L 1 - can be used as the set So- Let t(x) = gcd(x p — 1, 4>s 1 (x)). Note that t(x) / 1 since g is a common 
root of x p — 1 and (f>s 1 (x) . Let L be the space of polynomials in F2 [x]/ (x p — 1) that are multiples of t(x) . Clearly, 
dimL = p — deg r. Fix some a G ¥ p and (3 G (2). Let us prove that (pa+ps-i ( x ) is in L : 

</> a+ p Sl (x) = x> Sl (/) = x a {<p Sl (x)f. 

The last identity above follows from the fact that for any / G F2[x] and any integer i : /(x 2 ') = {f{x)) 2 \ ■ 



In what follows we present sufficient conditions for the existence of fc-tuples of p-th roots of unity in F2 that 
sum to zero. We treat the k = 3 case separately since in that case we can use a specialized argument to derive a 
more explicit conclusion. 

4.1 A sufficient condition for the existence of three p-th roots of unity summing to zero 

Lemma 8 Let p be an odd prime. Suppose ord 2 (p) < (4/3) \og 2 p; then there exist three p-th roots of unity in F 2 
that sum to zero. 

Proof: We start with a brief review of some basic concepts of projective algebraic geometry. Let F be a field, and 
/ G ¥[x, y, z] be a homogeneous polynomial. A triple (xo,yo, zo) £ ^ 3 is called a zero of / if f(xo,yo, zq) = 0. 
A zero is called nontrivial if it is different from the origin. An equation / = defines a projective plane curve 
Nontrivial zeros of / considered up to multiplication by a scalars are called F-rational points of . If F is a finite 
field it makes sense to talk about the number of F-rational points on a curve. 

Let t = ord 2 (p) • Note that C p C F 2 t . Consider a projective plane Fermat curve x defined by 

x (2*-l)/p + y (2«-l)/ P + z (2*-l)/p = (1Q) 

Let us call a point a on x trivial if one of the coordinates of a is zero. Cyclicity of ¥* t implies that x contains 
exactly 3(2* — V)/p trivial F 2 * -rational points. Note that every nontrivial point of % yields a triple of elements of 
C p that sum to zero. The classical Weil bound [17, p. 330] provides an estimate 

\N q -(q+l)\<(d-l)(d-2)y/q (11) 

for the number N q of F 9 -rational points on an arbitrary smooth projective plane curve of degree d. (11) implies 
that in case 



2* + 1 > ( = -1)1 - 2 ) 2*'* + 3 (12) 



2* - 1 \ /2* - 1 \ t/ o 2* - 1 



2 W/2 



p J \ P J V 

there exists a nontrivial point on the curve (10). Note that (12) follows from 

2* + 1 > - - 2*/ 2 + , (13) 

\P J \Pj P P 

and (13) follows from 

2 t >2 2t+t/2 /p 2 and 2 */ 2+1 >3. 
Now note that the first inequality above follows from t < (4/3) log 2 p and the second follows from t > 1. ■ 

Note that the constant 4/3 in lemma 8 cannot be improved to 2: there are no three elements of Ci3 2 645 2 g that 
sum to zero, even though ord 2 (13264529) = 47 < 2 * log 2 13264529 w 47.3. 

4.2 A sufficient condition for the existence of k p-th roots of unity summing to zero 

Our argument in this section comes in three steps. First we briefly review the notion of (additive) Fourier 
coefficients of subsets of F 2 t . Next, we invoke a folklore argument to show that subsets of F 2 t with appropriately 
small nontrivial Fourier coefficients contain /e-tuples of elements that sum to zero. Finally, we use a recent result of 
Bourgain and Chang [5] (generalizing the classical estimate for Gauss sums) to argue that (under certain constraints 
on p) all nontrivial Fourier coefficients of C p are small. 

For x G F 2 t let Tr(x) = x + x 2 + . . . + x 2 * 1 denote the trace of x. It is not hard to verify that for all x, 
Tr(x) G F 2 . Characters of F 2 t are homomorphisms from the additive group of F 2 t into the multiplicative group 



{±1}. There exist 2* characters. We denote characters by Xa, where a ranges in F 2 t, and set Xa( x ) = (— l) Tr ( ax ) . 
Let C(x) denote the incidence function of a set C C F 2 t. For arbitrary a G F 2 the Fourier coefficient x a (C) is 
defined by Xa(C) = X] Xa(^)C(x), where the sum is over all x € F 2 t. Fourier coefficient xo(C) = |C| is called 
trivial, and other Fourier coefficients are called nontrivial. In what follows J2 X stan ds for summation over all 2* 
characters of F 2 t . We need the following two standard properties of characters and Fourier coefficients. 

Ex(x) = {^;^l a* 



Y 1 x\C) = t\C\. (15) 



The following lemma is a folklore. 



l/(fc-2) 

<[^r) (16) 



Lemma 9 Le? C C F 2 t ara<i k > 3be a positive integer. Let F be the largest absolute value of a nontrivial Fourier 
coefficient of C. Suppose 

I. < (B) 

\C\ <■ \ 2' ) 
then there exist k elements of C that sum to zero. 

Proof: Let M(C) = #{&,.■■, Ck € C I & + ■■■ + Ck = 0} . (14) yields 

M ( C ) = Jt E C(x 1 )...C(x k )^2 X (x 1 + ... + x k ). (17) 

x 1: ...,x k e¥ 2 t x 

Note that x{ x i + • • • + x k ) = x( x i) ■ ■ ■ xi x k)- Changing the order of summation in (17) we get 

M ^ = \t E E • • • c (*k)x(xi) ■ ■ ■ x(sk) = ^ E 

X xi,...,x fc 6F 2 t X 

Note that 

^E^) = ^ + ^E > 1^ - F k ~ 2 ^ £ X 2 (C) = ^ - F^|C|, (19) 

X Xt^XO X 

where the last identity follows from (15). Combining (18) and (19) we conclude that (16) implies M(C) > 0. ■ 

The following lemma is a special case of [5, theorem 1]. 
Lemma 10 Assume that n | 2* — 1 and satisfies the condition 

( n ' |rri) < 2* (1 ~ eM \ far all l<t'<t,t'\t, 
where e > is arbitrary and fixed. Then for all a € F* t 



gcd 



x£¥ 2 t 



<c 1 2* (1 ~ <5) , (20) 



where 5 = 5(e) > a«<i c\ = c\(e) are absolute constants. 



Below is the main result of this section. Recall that C p denotes the set of p-th roots of unity in F2 . 

Lemma 11 For every c > there exists an odd integer k = k(c) such that the following implication holds. If p is 
an odd prime and (p) < c log 2 p then some k elements of C p sum to zero. 

Proof: Note that if there exist k! elements of a set C C F 2 that sum to zero, where k' is odd; then there exist 
k elements of C that sum to zero for every odd k > k! . Also note that the sum of all p-th roots of unity is 
zero. Therefore given c it suffices to prove the existence of an odd k = k(c) that works for all sufficiently large 
p. Let t = ord2(p). Observe that p > 2*/ c . Assume p is sufficiently large so that t > 2c. Next we show that 
the precondition of lemma 10 holds for n = (2* — l)/p and e = l/(2c). Let t' \ t and 1 < t' < t. Clearly 
gcd(2* — l,p) = 1. Therefore 

/2*-l 2*-l\ 2* - 1 2*( 1 -Vc) 

gcd {— 2^1 ) = pW^T) < "^T' (21) 

where the inequality follows from p > 2*/ c . Clearly, t > 2c yields 2*/( 2c ) /2 > 1. Multiplying the right hand side 
of (21) by 2*/( 2c ) /2 and using 2(2*' - 1) > 2*' we get 

gcd(^-l,^i)<2^/(2e)M'. (22) 



Combining (22) with lemma 10 we conclude that there exist 5 > and C\ such that for all a G F* t 

<c 1 2*( 1 -' 5 ). (23) 



^2 (-if r ( ax(2t ~ 1)/p ) 



Observe that x^ 2 * takes every value in C v exactly (2* — 1) jp times when x ranges over ¥* t . Thus (23) implies 

(2*-l)(F/p) <ci2*( 1 - 5 \ (24) 

where F denotes that largest nontrivial Fourier coefficient of C p . (24) yields F/p < (2ci)2~ 5 *. Pick k > 3 to be 
the smallest odd integer such that (1 — l/c)/(k — 2) < 5. We now have 

F (l-l/e)t 

— < 2 ( fc - 2 ) (25) 
P 

for all sufficiently large values of p. Combining p > 2*/ c with (25) we get 

f nc^ 1 ^ 



< 



I Cp I \ 2* 



and the application of lemma 9 concludes the proof. 



4.3 Summary 



In this section we summarize our positive results and show that one does not necessarily need to use Mersenne 
primes to construct locally decodable codes via the methods of [34]. It suffices to have Mersenne numbers with 
polynomially large prime factors. Recall that P(m) denotes the largest prime factor of an integer m. Our first 
theorem gets 3-query LDCs from Mersenne numbers m with prime factors larger than m 3 / 4 . 



Theorem 12 Suppose P(2 t - 1) > 2 75t ; then for every message length n there exists a three query locally 
decodable code of length exp(n 1 /*). 

Proof: Let P(2* - 1) = p. Observe that p | 2* - 1 and p > 2 a75t yield ord 2 (j>) < (4/3) log 2 p. Combining 
lemmas 8,7 and 6 with proposition 5 we obtain the statement of the theorem. ■ 

As an example application of theorem 12 one can observe that P(2 23 - 1) = 178481 > 2( 3 / 4 )* 23 « 155872 yields 
a family of three query locally decodable codes of length exp(n 1 / 23 ). Theorem 12 immediately yields: 

Theorem 13 Suppose for infinitely many t we have P(2 t — 1) > 2 075 *; then for every e > there exists a family 
of three query locally decodable codes of length exp(n e ). 

The next theorem gets constant query LDCs from Mersenne numbers m with prime factors larger than m 7 for 
every value of 7. 

Theorem 14 For every 7 > there exists an odd integer k = A; (7) such that the following implication holds. 
Suppose P(2 t - 1) > 2 7 *; then for every message length n there exists a k query locally decodable code of length 
exp(n 1 /*). 

Proof: Let P(2 t — 1) = p. Observe that p | 2* — 1 and p > 2 7 * yield ord2<jj) < (I/7) log 2 p. Combining 
lemmas 22,7 and 6 with proposition 5 we obtain the statement of the theorem. ■ 

As an immediate corollary we get: 

Theorem 15 Suppose for some 7 > and infinitely many t we have P(2 t — 1) > 2 7 * ; then there is a fixed k such 
that for every e > there exists a family ofk query locally decodable codes of length exp(n e ). 

5 Nice subsets of finite fields yield Mersenne numbers with large prime factors 

Definition 16 We say that a sequence {S'j C F*. } of subsets of finite fields is k-nice if every Si is k alge- 
braically nice and t(i) combinatorially nice, for some integer valued monotonically increasing function t. 

The core proposition 5 asserts that a subset S C F* that is k algebraically nice and t combinatorially nice yields 
a family of /c-query locally decodable codes of length exp(n 1 /*). Clearly, to get fc-query LDCs of length exp(n e ) 
for some fixed k and every e > via this proposition, one needs to exhibit a /c-nice sequence. In this section 
we show how the existence of a /c-nice sequence implies that infinitely many Mersenne numbers have large prime 
factors. Our argument proceeds in two steps. First we show that a /c-nice sequence yields an infinite sequence of 
primes {pi} i>l , where every C Pi contains a /c-tuple of elements summing to zero. Next we show that C p contains 
a short additive dependence only if p is a large factor of a Mersenne number. 

5.1 A nice sequence yields infinitely many primes p with short dependencies between p-th roots of unity 

We start with some notation. Consider a a finite field ¥ q = ¥ p i , where p is prime. Fix a basis e± , . . . , e\ of ¥ q 

over W p . In what follows we often write (a±, . . . ,a>i) 6 F^ to denote a = Yl\=i a i e i e ^q- ^ et R denote the ring 
F2[xi, . . . , xi\/ (x\ — 1, . . . ,xf — 1). Consider a natural one to one correspondence between subsets Si of¥ q and 
polynomials 4>s 1 (xi,...,xi) G R. 

4> Sl (xi,...,xi) = x^...xf l . 

(ai,...,ai)eSi 



It is easy to see that for all sets Si C ¥ q and all a, (3 G F g 



.Q; 



) +/35l (xi, . . . ,xj) = xl 1 . ..x? l (f>p Sl {xi, ■ ■ -,xi). (26) 



Let r be a family of subsets of ¥ q . It is straightforward to verify that a set So C ¥ q has even intersections with 
every element of T if and only if <f>s belongs to L ± , where L is the linear subspace of R spanned by {4>Si}s 1 er ■ 
Combining the last observation with formula (26) we conclude that a set S C F* is k algebraically nice if and 
only if there exists a set Si C F g of odd size k' < k such that the ideal generated by polynomials {(j>psi} {pes} 
is a proper ideal of R. Note that polynomials {fi, . . . , /^} € i? generate a proper ideal if an only if polynomials 
{fi, . . . , fh, x\ — 1, . . . , xf — 1} generate a proper ideal in F2[xi, . . . , xi\. Also note that a family of polynomials 
generates a proper ideal in F2[xi, . . . , x{\ if and only if it generates a proper ideal in ¥2[xi,..., xi]. Now an 
application of Hilbert's Nullstellensatz [7, p. 168] implies that a set S C F* is k algebraically nice if and only 
if there is a set Si C ¥ q of odd size k' < k such that the polynomials {<l>i3Si}{p e s} an( * { x i ~ ^i<i<l nave a 
common root in F 2 . 

Lemma 17 Let ¥ q = ¥ p i , where p is prime. Suppose ¥ q contains a nonempty k algebraically nice subset; then 
there exist Ci > • • • ? Cfc ^ C v such that Ci + • • • + Ck = 0. 

Proof: Assume S C F* is nonempty and algebraically nice. The discussion above implies that there exists 

Si C F g of odd size k! < k such that all polynomials {^Sil^gs} vanish at some (Cij • • • > 0) e Cp- F lx an 
arbitrary /3o G S, and note that C p is closed under multiplication. Thus, 

^ oSl (Ci,...,0) = (27) 

yields k' p-th roots of unity that add up to zero. It is readily seen that one can extend (27) (by adding an appropriate 
number of pairs of identical roots) to obtain k p-th roots of unity that add up to zero for any odd k > k! . ■ 

Note that lemma 17 does not suffice to prove that a fc-nice sequence {Si C ¥*.} . >1 yields infinitely many primes p 
with short (nontrivial) additive dependencies in C p . We need to argue that the set {charF 9i } i>:L can not be finite. To 

proceed, we need some more notation. Recall that q = p l andp is prime. For x e¥ q let Tr(x) = x+. . . +x pl 1 G 
F p denote the (absolute) trace of x. For 7 6 ¥ q , c G F* we call the set 7r 7jC = {x G ¥ q \ Tr(-yx) = c} a proper 
affine hyperplane of ¥ q . 

Lemma 18 Let ¥ q = ¥ p i , where p is prime. Suppose S C F* is k algebraically nice; then there exist h < p k 

h 

proper affine hyperplanes {^.CjIk^/j of¥ q such that S C (J 7r 7iiCi . 

' i=i 

Proof: Discussion preceding lemma 17 implies that there exists a set Si = {ai, . . . ,ov} C F g of odd size 
k' < k such that all polynomials {^Sil^gs} vanish at some (£1, . . . , Cz) £ C p . Let £ be a generator of C p . For 
every 1 < i < I pick ioi G Z p such that ^ = • For every (3 G S, 4>/3s 1 {(1, ■ ■ ■ , (1) = yields 

^Uw«i = o. (28) 

n=(n 1 ,...,m)e/3Si 

Observe that for fixed values {^Ik^^ G Z p the map D(/j,) = Yl\=i Vi^i i s a linear map from F g to ¥ p . It is 

not hard to prove that every such map can be expressed as D(/j.) = Tr{5p) for an appropriate choice of 5 G ¥ q . 
Therefore we can rewrite (28) as 

Y ( Tr < 5 ^ = ^ C Tr{5l3a) = 0. (29) 
ne/3Si <re5i 



Let W = {( Wl ,...,w k/ ) G Z£ | C 1 + • • • + C k ' =0} denote the set of exponents of fc'-dependencies be- 
tween powers of £. Clearly, \W\ < p k . Identity (29) implies that every j3 G S satisfies 

' Tr{{8<n)P) = w u 

< : (30) 

k Tr({5a k ,)P) = w k ,; 

for an appropriate choice of (w\ , . . . , w k > ) G W. Note that the all-zeros vector does not lie in W since k' is odd. 
Therefore at least one of the identities in (30) has a non-zero right-hand side, and defines a proper affine hyperplane 
of F q . Collecting one such hyperplane for every element of W we get a family of \ W\ proper affine hyperplanes 
containing every element of S. ■ 

Lemma 18 gives us some insight into the structure of algebraically nice subsets of F q . Our next goal is to develop 
an insight into the structure of combinatorially nice subsets. We start by reviewing some relations between tensor 
and dot products of vectors. For vectors u G F™ and v G F™ let u <8> v G F q nn denote the tensor product of u and v. 
Coordinates of u <8> v are labelled by all possible elements of [m] x [n] and (u <8> v)ij = UiVj. Also, let u® 1 denote 
the /-the tensor power of u and uov denote the concatenation of u and v. The following identity is standard. For 
any u, x G F™ and v, y G : 

(u®v,x®y) = ^2 UiVjXiVj = ^2 UiXi \ ^2 v 3 y A =( u ' x )( v >y)- ( 31 ) 

ie[m],je[n] \ie[m] j \je[n] J 

In what follows we need a generalization of identity (31). Let f(x±, . . . , Xh) = J2i c i x \ Y ■ ■ ■ x h be a polynomial 

in F 9 [xi, . . . , Xh]. Given / we define / G F Q [xi, . . . , x^\ by / = J2i ^i 1 • • • x °h ' ^- e -' we si m ply set all nonzero 
coefficients of / to 1. For vectors m, . . . , Uh in F™ define 

f(u 1 ,...,u h ) = o i auf ai <g> . . . <g> % 0,1 . (32) 

Note that to obtain f(u±, . . . , Uh) we replaced products in / by tensor products and addition by concatenation. 
Clearly, f(u±, . . . , Uh) is a vector whose length may be larger than m. 

Claim 19 For every f G F q [x\ , . . . , x^] a«<i u± , . . . , , vi , . . . , G F™ : 

=/((ui,«i),...,(« h ,u ft )). (33) 

Proof: Let u = (u\, . . . , Uh) and v = (v\, . . . , Vh). Observe that if (33) holds for polynomials fi and /2 defined 
over disjoint sets of monomials then it also holds for f = fi + f2 '■ 

(/(u),/») = ((/i + / 2 )(u),(/i + / 2 )(v)) = (/i(u)o/ 2 (u),/i(v)o/ 2 (v)) = 
fi ((ui,vi), {u h ,v h )) + f 2 ((ui,v{), . . . , (u h ,v h )) = f((ui,vi), {u h ,v h )) . 

Therefore it suffices to prove (33) for monomials / = ex" 1 . . . x^ h . It remains to notice identity (33) for monomi- 
als / = ex" 1 . . . x^ h follows immediately from formula (31) using induction on Yli=i a i- ' 

The next lemma bounds combinatorial niceness of certain subsets of F*. 

Lemma 20 Let F q = F p ;, where p is prime. Let S C F*. Suppose there exist h proper affine hyperplanes 

h 

{ 7r 7r,c r } 1<r </ l ofW q such that S C [j vr 7riCr ; then S is at most h{p — 1) combinatorially nice. 

r=l 



Proof: Assume S is t combinatorially nice. This implies that for some c > and every m there exist two 
n = \ L cm t \ -sized collections of vectors {ui} ie [ n ] and {vi} ie y ri ^ in F™, such that: 

• For all i G [n], (uj, Uj) = 0; 

• For all i,j G [n] such that z / j, (uj,Vi) G 5. 

For a vector u G F™ and integer e let tt e denote a vector resulting from raising every coordinate of u to the power e. 
For every z G [n] and r € [/i] define vectors it^ and in F™' by 

^■ r) = (irUi) ° (7r^i) p o . . . o (j r Ui) pl 1 and ) = Vi o vf o . . . o v p . (34) 
Note that for every n, ri G [h], vf 1 ^ = v^ T2 \ It is straightforward to verify that for every i,j G [n] and r G [h] : 

(«f,«{ r) ) =Tr( 7r (n i , Ul )). (35) 
Combining (35) with the fact that S 1 is covered by proper affine hyperplanes vr 7ijCi we conclude that 

• For all i G [n] and r e [h], (u\ r \vty = 0; 

• For all i,j G [n] such that i / j, there exists r G [/i] such that (u^, £ ^* p - 

Pick . . . , x/j) G F p [a?i, . . . , x^] to be a homogeneous degree /i polynomial such that for a = [a\, . . . , a^) G 
Fp : g(a) = if and only if a is the all-zeros vector. The existence of such a polynomial g follows from [17, 
Example 6.7]. Set / = g p ~ x . Note that for aGFj: /(a) = if a is the all-zeros vector, and f(a) = 1 otherwise. 
For all % G [n] define 

u' i = f(uf\...,uf ) )o{l) and ^ = /(^,...,^)o(-l). (36) 

Note that / and / are homogeneous degree (p — l)/t polynomials in /i variables. Therefore (32) implies that 
for all i vectors u' { and v\ have length m' < h^ p ~ 1 ^ h (ml)^ 1 ^ h . Combining identities (36) and (33) and using the 

properties of dot products between vectors ju^ j and j discussed above we conclude that for every m there 
exist two n = [cm*] -sized collections of vectors -jXlieln] an d { v i}ie[n] i n sucn that: 

• For all i G [n], (u-, t>-) = -1; 

• For all i,j G [n] such that z / j, (uj,Vi) = 0. 

It remains to notice that a family of vectors with such properties exists only if n < m',i.e., [cm*J < h^ -1 ^ (ml)^ 1 ^ . 
Given that we can pick m to be arbitrarily large, this implies that t < (p — l)h. ■ 

The next lemma presents the main result of this section. 

Lemma 21 Let k be an odd integer. Suppose there exists a k-nice sequence; then for infinitely many primes p 
some k of elements of C p add up to zero. 

Proof: Assume [Si C F* i } i>1 is /c-nice. Let p be a fixed prime. Combining lemmas 18 and 20 we conclude 

that every k algebraically nice subset S C F* ; is at most (p — l)p k combinatorially nice. Note that our bound on 
combinatorial niceness is independent of I. Therefore there are only finitely many extensions of the field F p in the 
sequence {F g J i>1 , and the set P = {charF^}^ is infinite. It remains to notice that according to lemma 17 for 
every p G P there exist k elements of C p that add up to zero. ■ 

In what follows we present necessary conditions for the existence of fc-tuples of p-th roots of unity in F2 that 
sum to zero. We treat the k = 3 case separately since in that case we can use a specialized argument to derive a 
slightly stronger conclusion. 



5.2 A necessary condition for the existence of k p-th roots of unity summing to zero 
Lemma 22 Let k > 3 be odd and p be a prime. Suppose there exist £i, . . . , (k £ C p such that Yli=i d = 0; then 

ord 2 (p) < 2p 1 - 1 / ( - k - 1 \ (37) 

Proof: Let t = ord 2 (p). Note that C p C F 2 t. Note also that all elements of C p other than the multiplicative 
identity are proper elements of F 2 t. Therefore for every ( G C p where ( ^ 1 and every /(x) G F 2 [x] such that 
deg / < t - 1 we have: /(C) / 0. 

By multiplying Yli=i d = through by C^T 1 , we may reduce to the case Cfc = 1- Let ( be the generator of C p . 
For every i e [k — 1] pick it;, G 1 P such that & = ( Wi ■ We now have Y^=i ( Wi + 1 = 0. Set h = [(* - 1)/2J . 
Consider the (k — l)-tuples: 

(mwi + h, . . . , mwk-i + ik-i) £ for mGZ p and «i, . . . , i € [0, h]. (38) 

Suppose two of these coincide, say 

(mwi +h,.. .,mw k -i + i fe _i) = (m'toi + i'j, . . .,m'w k - 1 + 4-i)> 

with (m, ii, . . . , ifc-i) / (m', i^, . . . , i'k-i)- Set n = m — m! and ji = i\ — H for I G [A; — 1]. We now have 

(raui,...,mu fe _i) = 

with — /i < ji, . . . , jfc-i < /i. Observe that n / 0, and thus it has a multiplicative inverse g G Z p . Consider a 
polynomial 

P(z) = z jl+h + ... + z j *- 1+h + z h G F 2 [z]. 

Note that degP < 2/i < t - 1. Note also that P(l) = 1 and P(( 9 ) = 0. The latter identity contradicts the fact 
that C g is a proper element of F 2 t . This contradiction implies that all (k — l)-tuples in (38) are distinct. This yields 



p k ~ x > p 



k-i 



which is equivalent to (37). ■ 

5.3 A necessary condition for the existence of three p-th roots of unity summing to zero 

In this section we slightly strengthen lemma 22 in the special case when k = 3. Our argument is loosely inspired 
by the Agrawal-Kayal-Saxena deterministic primality test [1]. 

Lemma 23 Let p be a prime. Suppose there exist Ci , C2 , C3 £ C p that sum up to zero; then 

ord 2 (p) < ((4/3» 1/2 . (39) 

Proof: Let t = ord 2 (p). Note that C p C F 2 t. Note also that all elements of C p other than the multiplicative 
identity are proper elements of F 2 t. Therefore for every ( G C p where ( 7^ 1 and every f(x) G F 2 [x] such that 
deg / < t - 1 we have: /(C) ± 0. 

Observe that Ci + C2 + Cs = implies CiC 2 _1 + 1 = CsC^ 1 - This y ields (CiCa^ 1 + if = 1- Put C = CiCs" 1 - 
Note that C / 1 and C, 1 + C G C p . Consider the products ir itj = C(l + C) j G C p for < i, j < t — 1. Note that 
TTjj, 7Tfc,z cannot be the same if i > fc and Z > j, as then 



0- fc - (1 + 0^ = 0, 



but the left side has degree less than t. In other words, if tti j = irk : i and (i, j) ^ (k, I), then the pairs (i, j) and 
(k, I) are comparable under termwise comparison. In particular, either (k, I) = (i+a, j+b) or (i, j) = (k+a, l+b) 
for some pair (a, b) with ir a ^ = 1. 

We next check that there cannot be two distinct nonzero pairs (a, b), (a', b') with ir a ^ = 7r a ' £/ = 1. As above, 
these pairs must be comparable; we may assume without loss of generality that a < a',b < b'. The equations 
ir a ,b = 1 and iTa'-a,b'-b = 1 force a + b > t and (a' — a) + (b' — b) > t, so a' + b' > 2t. But a', b' < t — 1, 
contradiction. 

If there is no nonzero pair (a, b) with < a, b < t — 1 and 7r ai 6 = 1, then all tti j are distinct, so p > t 2 . 
Otherwise, as above, the pair (a, b) is unique, and the pairs with < i,j < t — 1 and 2 i a jb) are 
pairwise distinct. The number of pairs excluded by the condition (i, j) ^ (a, b) is (t — a)(t — b); since a + b > t, 
(t -a)(t-b)< t 2 /4. Hence p > t 2 - t 2 /4 = 3t 2 /4 as desired. ■ 

While the necessary condition given by lemma 23 is quite far away from the sufficient condition given by 
lemma 8, it nonetheless suffices for checking that for most primes p, there do not exist three p-th roots of unity 
summing to zero. For instance, among the 664578 odd primes p < 10 s , all but 550 are ruled out by Lemma 23. 
(There is an easy argument that t must be odd if p > 3; this cuts the list down to 273 primes.) Each remaining 
p can be tested by computing gcd(x p + 1, (x + l) p + 1); the only examples we found that did not satisfy the 
condition of lemma 8 were (p, t) = (73, 9), (262657, 27), (599479, 33), (121369, 39). 

5.4 Summary 

In the beginning of this section 5 we argued that in order to use the method of [34], (i.e., proposition 5) to obtain 
fc-query locally decodable codes of length exp(n e ) for some fixed k and all e > 0, one needs to exhibit a /c-nice 
sequence of subsets of finite fields. In what follows we use technical results of the previous subsections to show 
that the existence of a /c-nice sequence implies that infinitely many Mersenne numbers have large prime factors. 

Theorem 24 Let k be odd. Suppose there exists a k-nice sequence of subsets of finite fields; then for infinitely 
many values oft we have 

P(2 l - 1) > (t/2) 1+1 /( fe - 2 ). (40) 

Proof: Using lemmas 21 and 22 we conclude that a fc-nice sequence yields infinitely many primes p such that 
ord 2 (p) < 2p 1 - 1 /(fc-i). Letpbe such aprime and t = ord 2 (p). Then P(2* - 1) > (t/2) 1+1 /( fc - 2 ). ■ 

A combination of lemmas 21 and 23 yields a slightly stronger bound for the special case of 3-nice sequences. 

Theorem 25 Suppose there exists a 3-nice sequence of subsets; then for infinitely many values oft we have 

P(2* - 1) > (3/4)t 2 . (41) 

We would like to remind the reader that although the lower bounds for P(2 t — 1) given by (40) and (41) are 
extremely weak light of the widely accepted conjecture saying that the number of Mersenne primes is infinite, 
they are substantially stronger than what is currently known unconditionally (2). 

6 Conclusion 

Recently [34] came up with a novel technique for constructing locally decodable codes and obtained vast im- 
provements upon the earlier work. The construction proceeds in two steps. First [34] shows that if there exist 
subsets of finite fields with certain 'nice' properties then there exist good codes. Next [34] constructs nice subsets 
of prime fields ¥ p for Mersenne primes p. 



In this paper we have undertaken an in-depth study of nice subsets of general finite fields. We have shown 
that constructing nice subsets is closely related to proving lower bounds on the size of largest prime factors of 
Mersenne numbers. Specifically we extended the constructions of [34] to obtain nice subsets of prime fields F p 
for primes p that are large factors of Mersenne numbers. This implies that strong lower bounds for size of the 
largest prime factors of Mersenne numbers yield better locally decodable codes. Conversely, we argued that if one 
can obtain codes of subexponential length and constant query complexity through nice subsets of finite fields then 
infinitely many Mersenne numbers have prime factors larger than known currently. 
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